Personnel
Overall Objectives
Research Program
Application Domains
Highlights of the Year
New Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Section: New Results

Data Search

Adversarial Autoencoders For Novelty Detection

Participants : Valentin Leveau, Alexis Joly.

In this work [40], we addressed the problem of novelty detection, i.e recognizing at test time if a data item comes from the training data distribution or not. We focus on Adversarial autoencoders (AAE) that have the advantage to explicitly control the distribution of the known data in the feature space. We show that when they are trained in a (semi-)supervised way, they provide consistent novelty detection improvements compared to a classical autoencoder. We further improve their performance by introducing an explicit rejection class in the prior distribution coupled with random input images to the autoencoder.

Going deeper in the automated identification of Herbarium specimens

Participants : Alexis Joly, Herve Goeau.

Hundreds of herbarium collections have accumulated a valuable heritage and knowledge of plants over several centuries. Recent initiatives started ambitious preservation plans to digitize this information and make it available to botanists and the general public through web portals. However, thousands of sheets are still unidentified at the species level while numerous sheets should be reviewed and updated following more recent taxonomic knowledge. These annotations and revisions require an unrealistic amount of work for botanists to carry out in a reasonable time. Computer vision and machine learning approaches applied to herbarium sheets are promising but are still not well studied compared to automated species identification from leaf scans or pictures of plants in the field. In this work [14], we proposed to study and evaluate the accuracy with which herbarium images can be potentially exploited for species identification with deep learning technology. In addition, we proposed to study if the combination of herbarium sheets with photos of plants in the field is relevant in terms of accuracy, and finally, we explore if herbarium images from one region that has one specific flora can be used to do transfer learning to another region with other species; for example, on a region under-represented in terms of collected data. This is, to our knowledge, the first study that uses deep learning to analyze a big dataset with thousands of species from herbaria. Results show the potential of Deep Learning on herbarium species identification, particularly by training and testing across different datasets from different herbaria. This could potentially lead to the creation of a semi, or even fully automated system to help taxonomists and experts with their annotation, classification, and revision works.

Crowdsourcing Thousands of Specialized Labels: a Bayesian active training approach

Participants : Maximilien Servajean, Alexis Joly, Dennis Shasha, Julien Champ, Esther Pacitti.

The use of crowdsourced and more generally user-generated annotations became the de facto methodology for building training data in a variety of data indexing and search tasks. When the labels correspond to well known or easy-to-learn concepts, it is straightforward to train the annotators by giving a few examples with known answers. Neither is true when there are thousands of complex domain specific labels. In this work, we focused on the particular case of crowdsourcing domain-specific annotations that usually require hard expert knowledge (such as plant species names, architectural styles, medical diagnostic tags, etc.). We considered that common knowledge is not sufficient to perform the task but any people can be taught to recognize a small subset of domain-specific concepts. In such a context, it is best to take advantage of the various capabilities of each annotator through teaching (annotators can enhance their knowledge), assignment (annotators can be focused on tasks they have the knowledge to complete) and inference (different annotator propositions can be aggregated to enhance labeling quality). In this work [20], we proposed a set of data-driven algorithms to (i) train image annotators on how to disambiguate among automatically generated candidate labels, (ii) evaluate the quality of annotators’ label suggestions and (iii) weight predictions. The algorithms adapt to the skills of each annotator both in the questions asked and the weights given to their answers. The underlying judgements are Bayesian, based on adaptive priors. We measured the benefits of these algorithms by a live user experiment related to image-based plant identification involving around 1,000 people (at the origin of ThePlantGame, see Software section). The proposed methods yield huge gains in annotation accuracy. While a standard user could correctly label around 2% of our data, this goes up to 80% with machine learning assisted training and almost 90% when doing a weighted combination of several annotators’ labels.

Evaluation of Content-Based Biodiversity Identification techniques

Participants : Alexis Joly, Herve Goeau, Jean-Christophe Lombardo.

We ran a new edition of the LifeCLEF evaluation campaign [26] with the involvement of 15 research teams working on content-based biodiversity identification worldwide. The main novelties of the 2017 edition of LifeCLEF compared to the previous years were the following:

Pl@ntNet Business Venture proposal

Participants : Alexis Joly, Herve Goeau, Antoine Affouard, Jean-Christophe Lombardo.

The ACM Multimedia conference (rank A) introduced in 2017 a new "Business Venture Track" soliciting business venture proposals that combine multimedia technology. The aim is to bridge the gap between academia and industry on multimedia research, innovation and application. The track was open for submissions by all multimedia researchers and entrepreneurs. In this context, we have been working on a business venture proposal around the Pl@ntNet project that has been accepted for publication [25]. Our business proposal is to allow enterprises or organizations to set up their own private collaborative workflow within Pl@ntNet information system. The main added value is to allow them to work on their own business object (e.g. plant disease diagnostic, deficiency measurements, railway lines maintenance, etc.) and with their own community of contributors and end-users (employees, sales representatives, clients, observers network, etc.). This business idea answers to a growing demand in agriculture and environmental economics. Actors in these domains acknowledge that machine learning techniques are mature enough but the lack of training data and efficient tools to collect them remains a major problem. A collaborative platform like Pl@ntNet extended with the technical innovations presented in this paper is the ideal tool to bridge this gap. It will initiate a powerful positive feedback loop boosting the production of training data while improving the work of the employees.